Reduce time it takes to import SGLang #12510

raayandhar · 2025-11-02T04:04:28Z

Motivation

We notice that the time to import things in SGLang takes a lot of time (#10492). I have been looking into what is taking up a lot of time and if there are simple ways to help reduce this import time. From the original issue, we want to reduce:

time python -c "from sglang.srt.managers.scheduler import Scheduler"

which is what I have been focusing my efforts on. However, I think there are things we can do to reduce time for other imports. This is more of a V1 to get community feedback from experts.

Modifications

There are some heavy imports. For example, the quantization methods import at the module level is heavy. Moving some imports to the function level (only time it is used), we can reduce module import time. However, I can see how this can easily be an antipattern. In fact, it can hurt performance if we have a function that is used a lot that we have an import in. I tried to only do this in functions that we only expect to run once or a small number of times. However, I can understand the argument against this kind of code. I also don't think all the changes to hf_transformer_utils.py help so I will be taken a deeper look, since the changes are a bit invasive.

Accuracy Tests

These changes should not affect model outputs.

Benchmarking and Profiling

Running for i in {1..100}; do (time python -B -c "from sglang.srt.managers.scheduler import Scheduler") 2>&1 | grep "^real"; done | python calc_avg.py (calc_avg.py)

With these changes:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     8.308s                                                                                                    
Median:   8.236s                                                                                                    
Std Dev:  0.297s                                                                                                    
Min:      7.999s                                                                                                    
Max:      9.329s                                                                                                    
========================================

Compared to top-of-main:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     9.836s                                                                                                    
Median:   9.790s                                                                                                    
Std Dev:  0.332s                                                                                                    
Min:      8.655s                                                                                                    
Max:      11.801s                                                                                                   
========================================

so we have ~1.5 second improvement. Not the best, so I am going to keep working on it. I mostly targeted improving the timing in the creation of ModelConfig object. The difference so far is largely from removing the quantization import:

Without these changes import_sglang_tom.log

With these changes import_sglang_new.log

Machine:

AMD EPYC 7343 16-Core Processor
L40S GPU

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests. (N/A?)
Update documentation according to Write documentations. (N/A?)
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed. -- will try to do this, although I am GPU poor.

raayandhar · 2025-11-02T04:04:45Z

I will continue working on this, there is more improvements to be made.

python/sglang/srt/utils/hf_transformers_utils.py

python/sglang/srt/disaggregation/decode.py

raayandhar · 2025-11-06T06:34:13Z

python/sglang/srt/layers/moe/__init__.py

 )
+from sglang.utils import LazyImport
+
+MoeRunner = LazyImport("sglang.srt.layers.moe.moe_runner.runner", "MoeRunner")


I think this (LazyImport(...))is something we can apply to many other files as well, I'm not sure if there's a downside?

not sure. @merrymercy @fzyzcjy What are your opinions?

python/sglang/srt/managers/scheduler.py

raayandhar · 2025-11-06T07:06:48Z

Hi experts,

At this point, looking at the profiling, there's been some pretty good improvement in times. Looking at time python -X importtime -c "from sglang.srt.managers.scheduler import Scheduler" 2> import_sglang.log, we started at ~6000 ms overall, but now we are down to 4000 ms; see below:
Top-of-main, newly updated (import_sglang_tom.log):

This is the improved version, with my changes (import_sglang_improved.log)

I think it's best to just click the image and click again to see clearly. But in words we see an improvement of around 33%. In the improved version, most of what's left are transformers / torch imports that are basically unavoidable (without some extremely invasive changes). Otherwise a lot of the other imports have massively shrunk, i.e. model_config from 4300 ms to 550 ms, etc. You can see the logs for more details. I have some version that is super insanely optimized (just to see what's possible) that can improve it even further but the changes are really invasive and impractical.

The changes are largely just moving imports into functions so they are lazy-loaded, or moving imports to only run when we type check. Now as I've commented earlier, not all of the changes are very pretty. My rationale is the following-nearly all of the functions that I moved imports into are probably only going to run very intermittently, or even just once at object init (e.g. a lot of the functions in hf_transformers_utils.py are this way). Then, doing the lazy loading looks maybe a bit uglier but otherwise we reap good benefits for effectively no downside. I left some more comments with other thoughts of mine on how to best do this.

Also, this is largely specific to the import path and object in the issue (Scheduler). I think these changes should help other paths as well (i.e. the changes to hf_transformer_utils.py should be useful, among others). If there other paths that should be targeted let me know and I will work on it. Furthermore, I think there's a lot of free lunch for the two types of changes:

1. For imports that are only used as types, moving them under a if TYPE_CHECKING block seems to have no downside. A lot of code seems to have this but I guess some parts don't since I was able to find these changes for this path.
1. Using the LazyImport module when possible. This issue has been described before (Slow import #606), and this module is only used in sglang/__init__.py. It seems like there's no downside to using this (but I could be misunderstanding, please let me know), so we could be using it more broadly.

So these two changes could be used more broadly than just this path.

At this point I'm going to open to review. Not sure if this exactly tackles what the original issue was trying to get at, so appreciate any clarification on what direction this PR should go. Appreciate the time reviewers take to look at this PR!

hnyls2002 · 2025-11-10T04:24:44Z

@zhyncs @merrymercy Do you think this is needed?

zhyncs · 2025-11-10T04:27:58Z

@zhyncs @merrymercy Do you think this is needed?

This change does not seem to break the existing changes, I think it is acceptable.

raayandhar · 2025-11-10T04:29:18Z

If it's acceptable, I'm happy to continue applying the "free lunch" changes I mentioned in the bigger comment above. But I defer to the experts.

zhyncs · 2025-11-10T04:29:51Z

The drawback is that it introduces some extra cognitive load. May you fix the conflicts first? Thanks!

raayandhar · 2025-11-10T04:30:26Z

The drawback is that it introduces some extra cognitive load. May you fix the conflicts first? Thanks!

Sure, will do.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

…ttern Signed-off-by: Raayan Dhar [email protected] <[email protected]>

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

hnyls2002 · 2025-11-15T16:48:48Z

Please fix the conflicts. @raayandhar

Also I think this PR should be review by @merrymercy

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar · 2025-11-16T04:42:23Z

I investigated the previous CI errors and I think they be unrelated, to my best understanding, but I will monitor carefully again.
The changes are a bit extensive across a lot of different files. I think one strategy I think of is for when importing heavy classes, using LazyImport and keep it at top-level, to avoid cognitive load (and perhaps note that the import is expensive, so that is why we do it). When importing a util function with a heavy path, if it is just used once in the whole file, keep it at the function level. And then of course when only used for type checking, leave it under aTYPE_CHECKING block. I already do the latter two, but can update my PR to do the former. I think this is close in spirit to the earlier review. I think that might be the cleanest way to handle this. Happy to discuss further.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar commented Nov 2, 2025

View reviewed changes

python/sglang/srt/utils/hf_transformers_utils.py Outdated Show resolved Hide resolved

hnyls2002 self-assigned this Nov 3, 2025

raayandhar force-pushed the reduce-import-time branch 2 times, most recently from 9adc75a to 6ed8399 Compare November 6, 2025 02:47

raayandhar commented Nov 6, 2025

View reviewed changes

python/sglang/srt/disaggregation/decode.py Outdated Show resolved Hide resolved

raayandhar commented Nov 6, 2025

View reviewed changes

python/sglang/srt/managers/scheduler.py Outdated Show resolved Hide resolved

raayandhar marked this pull request as ready for review November 6, 2025 07:09

raayandhar requested review from BBuf, ByronHsu, Edwardf0t1, HaiShaw, Ying1123, ch-wan, fzyzcjy, hnyls2002, ispobock, kushanam, merrymercy, xiezhq-hermann and zhyncs as code owners November 6, 2025 07:09

raayandhar force-pushed the reduce-import-time branch from 193544d to a627e16 Compare November 10, 2025 04:49

raayandhar requested a review from Fridge003 as a code owner November 10, 2025 04:49

raayandhar added 16 commits November 14, 2025 18:09

moving configs into _register() improves time

bf71d4c

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

bigger improvements, reaching 7.5s-8s

bf404b8

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

more improvements, down to 3900 ms

6e8b91c

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

forgot to run precommit

d61bca3

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

fix decode

63e0085

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

move to lazy import

b93149b

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

refactor: move manager request types to separate file to avoid antipa…

b217d2c

…ttern Signed-off-by: Raayan Dhar [email protected] <[email protected]>

small tiny changes

8dc2b07

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

small cleanup

9baae85

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

get the correct imports

b6d0ed3

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

fix server_args

e1ca102

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

small fix to hf_transformer_utils

46fa608

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

use lazyimport

1d864e4

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

handle OR and other checks

bd0b92d

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

simplify

11b263b

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

fix regression after rebase

4986bee

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar force-pushed the reduce-import-time branch from 401ae89 to 4986bee Compare November 15, 2025 02:09

hnyls2002 assigned merrymercy Nov 15, 2025

raayandhar and others added 3 commits November 15, 2025 13:21

merge changes

319212a

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

Merge branch 'main' into reduce-import-time

4ee6d6c

Merge branch 'main' into reduce-import-time

47342be

raayandhar and others added 2 commits November 16, 2025 16:51

Merge branch 'main' into reduce-import-time

70839a0

merge

96f2ec0

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar requested a review from ShangmingCai as a code owner November 18, 2025 02:53

refactor work

3e53947

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar requested a review from DarkSharpness as a code owner November 18, 2025 05:16

raayandhar added 2 commits November 17, 2025 21:35

merge

da1137c

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

fix issue about import path

ba111a5

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

Reduce time it takes to import SGLang #12510

Are you sure you want to change the base?

Reduce time it takes to import SGLang #12510

Conversation

raayandhar commented Nov 2, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

raayandhar commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

raayandhar Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

hnyls2002 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

raayandhar commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Nov 10, 2025

Uh oh!

zhyncs commented Nov 10, 2025

Uh oh!

raayandhar commented Nov 10, 2025

Uh oh!

zhyncs commented Nov 10, 2025

Uh oh!

raayandhar commented Nov 10, 2025

Uh oh!

hnyls2002 commented Nov 15, 2025

Uh oh!

raayandhar commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

raayandhar commented Nov 6, 2025 •

edited

Loading

raayandhar commented Nov 16, 2025 •

edited

Loading